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Abstract. This paper proposes a closed- form optimal estimator based on the theory 
... of estimating functions for a class of linear ARCH models. The estimating function 

' (EF) estimator has the advantage over the widely used maximum likelihood (ML) and 

■ quasi-maximum likelihood (QML) estimators that (i) it can be easily implemented, (ii) 
CN , it does not depend on a distributional assumption for the innovation, and (iii) it does 

O I not require the use of any numerical optimization procedures or the choice of initial 

values of the conditional variance equation. In the case of normality, the asymptotic 
distribution of the ML and QML estimators naturally turn out to be identical and, 
I hence, coincides with ours. Moreover, a robustness property of the EF estimator is 

p— i' derived by means of influence function. Simulation results show that the efficiency 

. benefits of our estimator relative to the ML and QML estimators are substantial for 

' some ARCH innovation distributions. 

^ ■ 

. Keywords: ARCH process; least squares estimation, estimating function approach; 
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I . 1. Introduction 

> ■ 

^ I Since the seminal papers of Engle (1982) and Bollerslev (1986), autoregressive conditional 
OC • heteroskedasticity (ARCH) and generalised ARCH (GARCH) models have been proposed for 
^ ' modelling time series with non-constant conditional volatility. Since then, these models have 
^ . become perhaps the most popular and extensively studied financial econometric models (see 
^ : e.g., Engle (1995); Gourieroux (1997); Mikosch (2003); Francq and Zakoian (2004)). The 
literature on the subject is so vast that we will restrict ourselves to directing the reader to 
fairly comprehensive reviews by Bollerslev et al. (1992) and Shephard (1996). An excellent 
^ ' survey of the GARCH methodology in finance is also available such as Bauwens et al. (2006). 

■ ARCH model estimation can be achieved using a variety of methods such as conditional 
least squares (LS) estimation (Tj0stheim (1986)), maximum likelihood (ML) estimation un- 
der the assumption of conditional normality, quasi-maximum likelihood (QML) estimation 
(Weiss (1986); Francq and Zakoian (2004)), generalized method of moments (GMM) estima- 
tion (e.g.. Rich et al. (1991)). As is well-known, LS, QML and GMM estimation methods 
yield inefficient and possibly biased estimates relative to ML estimators when the true inno- 
vation distribution is known (see for example, Li and Turtle (2000)). However, the possibility 
for misspecification of the likelihood function for ML and QML estimators motivates our in- 
vestigation of an alternative estimation method for ARCH models. 

The purpose of this paper is to propose an estimator based on the estimating function 
(EF) approach for ARCH models that improves efficiency without any distributional as- 
sumptions for the innovation. This EF estimator admits a closed-form expression which is 
computationally simple and compares favorably with the ML and QML estimators. More- 
over, the EF estimator naturally turns out to have the same limiting distribution as the 
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ML and QML estimators and hence is also fully efficient when the innovation distribution 
is Gaussian. It is interesting to note that many standard results in the estimation of ARCH 
models based on conditional normality are recoverable under the EF approach. 

The rest of the paper is organized as follows. Section 2 describes the conditional LS and 
EF estimation procedures of ARCH models. In addition, the asymptotic efficiency of the 
EF estimator relative to the LS, ML and QML estimators is discussed. In particular, the 
lower bound of the asymptotic variance of the LS estimator is formulated. In Section 3, 
a robustness of the EF estimator is studied by means of influence function. In Section 4, 
we perform an experiment to examine the asymptotic behavior of EF, LS, ML and QML 
estimators in terms of mean square errors in a small and a large-sample of observations. The 
study demonstrates that the efficiency benefits of the EF estimator relative to the LS, ML 
and QML estimators are substantial for some ARCH innovation distributions. 



2. Estimating function formulation and efficiency 



In this section, we describe the problem of estimation for a class of ARCH(p) models char- 
acterized by the equations 

p 

Xt = (rt{Oo)£t, (^ti^o) =1^0 + ^ aojX^_j, t = 1, . . . , n, (1) 

where {et} is a sequence of independent, identically distributed random variables such that 
Eet = 0, Ee^ = 1, 9q = {ujq, aoi • • • , ctop)"^ is an unknown vector of true parameters satisfying 
cuq > 0, aoj > 0, j — 1, . . . ,p — 1, and et is independent of Xg, s < t. Henceforth, it is 
tacitly assumed aop > so that the model is of order p. We also assume that model (1) is 
stationary and ergodic. When p = 1, Nelson (1990) showed that a sufficient condition for 
the stationarity is £^(log(Q;oi^t )) < 0. For a general ARCH(p) model, Bougerol and Picard 
(1992) showed that it has a unique non-anticipative strictly stationary solution. 

We now turn to describe the conditional least squares estimation of model (1). Write 
Yt — (1, X^, . . . , Xf_p_^_^)'^ and rjt — {ef — l)a^{9o). Then the standard hnear autoregressive 
representation is given by 

Xf = e^Y^^ + r^t. (2) 

Suppose that an observed stretch Xf,...,X^^ is available. The vector of parameters is 
9 = {u!,ai . . . , ap)^ which belongs to a compact parameter space G C (0, oo) x [0, ooY, and 
9o e e. Let 



Qn{e) = J2{x^ - E{x^\:Ft-^)r = Y.{xl - 9-Y,.,f 



t=i t=i 

be the penalty function, where !Ft — c{-^s'^ — Then from the linear regression theory, 
we can define the conditional least squares (LS) estimator of 9 by 

^) = argming„(^) = {Y^Y)-^YX, (3) 

where Y is the matrix of order n x (1 + p) with tth row Yt_i and X — (Xf, . . . ,X^)^. 
Note that (3) does not take into account the nuisance parameter associated with variance 
and, hence, serves only as an initial estimator. The asymptotic validity of (3) can be easily 
established using an appropriate central limit theorem, and as expected, its efficiency is 
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smaller than that of the QML estimator. 

To describe the limiting distribution of (3), we impose an additional condition on 6*0, and 
the moment of Ef. Recall that we have assumed that the process is stationary and ergodic. 
Let 
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Introduce the notation 
product. 



-^(-^o/)) where (g) denotes the tensor 



Assumption 1 

E\et\^ < oo and ||E4|| < 1, 

where || ■ || is the spectral matrix norm. In the case when p — 1, and {st} is Gaussian, it 
is seen that ||E4|| < 1 implies ckqi < 105~4 0.3. The following theorem establishes the 
asymptotic distribution of (3). 

Lemma 1. Suppose that the assumptions of model (1) and Assumption 1 hold. Then 

as n ^ oo, 

where the matrices U = E(Yt^iY^'ii) and TZ{6q) = E[(T^{9Q)Yt-lY^^) are positive definite 
with typically hounded elements. 

Remcirk 1. Note that Assumption 1 ensures that U and TZ{9q) are all finite. When the 

errors are standard normal, necessary and sufficient condition for the existence of higher 
moments of in terms of the parameter 9q is given by Engle (1982, Theorems 1 and 2). 

Remcirk 2. As an illustration, we verify TZ{9q) is positive definite. Indeed, it is nonnegative 
definite, i.e., c^TZ{6o)c — E{al{9Q)(FYt-if > for any given vector c = (cq, . . . , Cp)^ e 
Moreover, if we suppose that 'R{9o) is not positive definite, then there exists a vec- 
tor (co, . . . ,Cjo) with Cj(, 7^ (jo < p) such that cq + CiX'^_-^ + ■ ■ ■ + Cj^^X]_J^^ = a.e. 
This implies o-f{9o) > a.e., because of Uq > 0. In this case, we can write Xg_j^ = 
—70 — 7iX^_j^ _ . . . _ 7jg_iX^_j^_,_j^, where = Ck/cj^. Hence, substituting this into the last 
term of cr^(^o) in (1) with s — jo — t — p entails an ARCH(p — 1) representation, leading to 
a contradiction. 

^(LS) 

The conditional least squares estimator 9n typically possesses the properties that it 
admits a closed-form expression, which is computationally easy. However, 9n^ in general 
is not asymptotically efficient. Thus we next discuss an asymptotically efficient estimator 
proposed by Godambe (1985) which has the following desirable properties: (i) it has an 
explicit form which is computationally easy (ii) it compares favourably with the ML and 
QML estimators. 
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Let X*^"^ = (Xi, . . . ,X„)'^ be a vector of random variables forming a stochastic pro- 
cess. The distribution family F of X^'^'> is specified by an unknown parameter vector 
9 — (^1, . . . , OpY and 9 — 9{F), F e F be a real parameter vector. An estimating function 
g'(X("\ d{F)) satisfying certain regularity conditions, is called a regular unbiased estimating 
function if 

E[g{X^''\e{F))] = Q, Fe¥. (4) 

For a given set of unbiased estimating functions g belonging to the class the estimating 
function g* & Q is said to be optimal for 9 if 

E[/(X('^), 9{F))]/{¥.[dg{X^-\ 9{F)) / d9\e=0^F)]Y 

is minimized for all F e F at = gi*. 

Let L be a class of linear combinations of unbiased estimating functions of the form 



9 



t=i 



where the weights at-i are any function of Xi,...,Xt-i and 9^ and ht is a function of 
Xi, . . . ,Xt and 9 satisfying EF{ht\Bt-i) = 0, where Bt — a{Xs,t < s}. Moreover, for all 
F e F, the /it's are mutually orthogonal. 
An obvious example of ht is 

ht^Xt-E{Xt\Bt-i), (5) 

which is the residual between X^ and its best predictor E{Xt\Bt-i). We assume that ht and 
Ot-i are differentiable with respect to ^ for 1 < i < n. These considerations motivate the 
following result, which is due to Godambe (1985). 

Lemma 2. In the class Q of estimating functions g, the optimal estimating function is given 
by 

n 

9* (h-ihu 

21 



where a*t_^ = E[{dht/d9)\Bt-i]/ E[hj\B, 



t-i ■ 



By virtue of Lemmas 1 and 2, we are now in a position to state our main result. For this 
purpose, we need the following notation. In view of (5), we can set 

ht = X^ - E{X^\J^t-i) = Xl al{9,). (6) 

In general, the choice of an estimating function can be viewed in a manner analogous to the 
selection of moment conditions in the generahzed method of moments (see Hansen (1982)). 
Now based on (6), we have 

and E{hl\Tt-,)^E{Xt\Tt-,)-at{9o). 
Then by virtue of (6) and Lemma 2, the optimal estimating function is 



9*--y. jL:::\ (7) 



t 



- E{Xi\Tt-,) - at{9oy 



It should be pointed out that (7) is based on the finite sample, and it does not depend 
on any distributional assumptions for X| conditional on J^t-i- Noting that E{h^\J^t-i) = 
Var{ef)af{9o) and using (3), it follows that the solution to g'* = in (7) is the estimating 
function (EF) estimator given by 



Here, it is assumed that ^/n{9^^^ — Oq) = Op{l). We now impose the following additional 
regularity conditions. Recall the matrix A^t and write it as Aq = {Aot}- In the notation of 
Bougerol and Picard (1992), the top Lyapunov exponent is defined by 

7(A) = inf 7^(log||AiA2---At||) = hm 7 log || A2 ■ ■ ■ 

t>l t t-+oo t 

under the assumption that £^(log'^ ll>^oi||) < -^ll>^oi|| < 00. 
Assumption 2 

(i) 9q where © denotes the interior of the compact parameter space ©. 

(ii) 7(A) < 0. 

(iii) el has a non-degenerate distribution with Eel — ^■ 

(iv) Eel < 00. 

(v) r(^o) = E{Yt_,YlJat{eo)) is finite. 

Hence, we have the following theorem, which is the main result of the paper. The proofs for 
Lemma 1 and Theorem 1 are given in Section 5. 

Theorem 1. Suppose that the assumptions of model (1) and Assumption 2 hold. Then 
as n ^ 00, 

V^(^l^^) - ^0) ^ AA(0, VaT{el)T-\eo)). 



Remark 3. Under the assumption of conditional normality, we have E[Xf\Bt-i) — 3(7^(^o)) 
and in analogy with Engle (1982) the expression (7) reduces to 

Comparing (9) with the first-order condition of Engle (1982), we observe that they are 
equivalent up to a sign change. As is well known, under the additional assumption of 
normality, the ML estimator of 6 has the following asymptotic distribution, 

x/^(^(*^^) - ^0) ^ A/'(0, 2r-i(^o)), as n ^ 00. 

Hence, we conclude that the theory of estimating functions and ML method for the estima- 
tors of the ARCH model yield essentially the same asymptotic distribution. 
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Remark 4. As shown by Francq and Zakoian (2004) for the ARCH model, the QML 
estimator of 6 is obtained by maximising the normal log-likelihood function although the 
true probability density function is non-normal. Under the conditions of model (1) and 
Assumption 2, they showed that the QML estimator is asymptotically normal: 

^{§{QML) _ ^ _^(Q^ Var{el)V-\eo)), as n ^ oo. 

It is interesting to note that, if the true probability density function is the normal distribu- 
tion, the asymptotic distribution of the ML and QML estimators is identical and coincides 
with ours. 



Remark 5. Consider the GARCH(p, q) model 

p 1 

Xt = at{A)£t, (Tti^o) =u;o + Y, «oi^'-i + J] f^oja^jiA), t=l,...,n (10) 

i=i j=i 

where {st} is a sequence of independent, identically distributed random variables such that 
Eet = 0, Ee"^ = l, = (a;o, ctoi ■ ■ ■ , aop, /3oi, ■ ■ ■ , /3og)^ G © C (0, oo) x [0, oo)^'+« is an 
unknown vector of true parameters satisfying ujq > 0, aoi > 0, i = l,...,p, Poj > 0, 
j = 1, . . . ,q and et is independent of X^, s < t. Necessary and sufficient conditions under 
which the GARCH(p, q) equations have a unique, strictly stationary, and non-anticipative 
solution were found by Nelson (1990) for p = 1 and q = 1, and by Bougerol and Picard 
(1992a, b) for arbitrary p> 1 and g > 1. 

Write = (^? — l)cr?(''^o)- Then by analogy with (2), it follows that (10) can be repre- 
sented as an ARMA(p*, q) model: 

p* q 

^uJo + Yl '^^^^t-i + 6 - E ^oi^*-^ ' (11) 
i=l j=l 

where p* = max{p, q} and 0oi = aoi + /5oi > 0, i = 1, . . . ,p*. We have further defined aoi = 
for i > p and /3oj = for j > q. Henceforth, it is assumed that X^ is covariance-stationary 
provided that has finite variance and that the roots of 1 — (f)QiZ — ■ ■ ■ — (f)Qp*z^* — are 
outside the unit circle. Given the nonnegativity restriction, this means that X^ is covariance- 
stationary if + . . . + (pQp* < 1. 

Suppose that an observed stretch Xf, . . . is available from {X"^}. Let R{1) denote 
the autocovariance function of lag /, 

R{1) = E{Xl - ^){Xli - ^), / = 0, ±1, . . . , 

where // = E{Xf) — ujq/ (1 — 0oi — • • • — 0op*), with the corresponding estimator 

n-\l\ 



^ t=i 



where fi = Yll=i H^re the initial values of X^ = ■ ■ ■ = Xl_^» = Co = " " " = ^i-q = 

have negligible effect on parameter estimates when the sample size is large. The expression 

t=i 



l=-n+l 
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is called the periodogram of the partial realization of {X^}. The vector of parameters is 
ip = {u, (f)i . . . , . . . , Pq)^ G 0. By the stationarity and ergodicity, set cr| = E^^ > 

and write p — (</7^, f7|)^. Then the spectral density of {X^} is 



-2 



27r 

In order to estimate p, Hosoya and Taniguchi (1982) proposed to minimise 



D{f,{X),Q = £{log/,(A) + ^}^A 



with respect to p. Let = • • • , 4?""^^ /^i^'^^^ • • • , P^'^'^Y be a 

quasi-Gaussian maximum likelihood estimator of p which minimizes D{fp{X), In). Under 
certain conditions, they showed that 

^(^(QML) ^ A/-(0,47r(£^log/,(A)^log/,(A)dA)"'). 

Note that (11) does not take into account the nuisance parameter 'j?o associated with variance. 
Hence, pi*^^^^ indeed serves only as an initial estimator. 

The vector of parameters is ?9 = (cj, ctp, f3i, . . . , Pg)'^ £ ©. To apply Godambe's 

estimating function method to (11), let 

1=1 j=l 

with (Tq = 0"^]^ = . . . = a^g^-i^ = 0. Then we can construct af, t = 1, . . . ,n, iteratively. Once 

this is done, we can find an estimator p^ ^ = Pn^\pn'^^^) of p by means of Godambe's 
method. 



To gain a further insight into Theorem 1, it is interesting to compare the asymptotic 
variances of ^i^'^'' and 9^^^ in terms of their efficiency. This motivates to state the following 
result, whose proof is given in Section 4. 

Theorem 2. Under the conditions of Lemmas 1 and 2, the asymptotic variance of 6^^^ 
typically satisfies the inequality that Var{e^)U'^TZ{9o)U~^ > Var{e^)r^^{9o). 



3. Influence function 



An influence function is a statistical tool which provides rich qualitative information of 
how an estimator responds to a small amount of contamination at any point. In the follow- 
ing we introduce a robustness measure for the estimator given by (8). This robust measure 
by means of influence function will show how the initial estimator 9^^^"^ affects 9^^\ 

Let us first study a robustness property of 9[t^\ Write St = (X^ , . . . , X^_p_^_^)^ and 

Zs,t = i^jSf)^. Then we have §1^^'' = Ug^^s, where 

^ n 1 " 

75 = - S[^^Zs,t-i and = - Zs,t-iZst-v 
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Here, sj:^^ is the first component of St- We can now define tlie corresponding least squares 
functional as 

Ts = Us'^s, 

where 

js^E{si'^Zs,t-i) and Us ^ E{Zs,t-iZl,_,), 

As a measure of robustness property of , we consider the following contaminated 
process 

Ss,t = (1 - 5)St + 5Ut = St + 5Vt, 
where 5 e (0, 1). For 5'^ — {Ss,t}-, we can introduce an influence function 

T' = lim 

Noting the formula for differentiation of the inverse of a matrix dA~^ — —A~^{dA)A~^, we 
obtain 

^^Us:l^ = -Us\A + A^)Us\ (12) 
where A = E(Vt-iZ^t_^) with Vt = (0, Vt^f. Also, 



^ = E{v}'^Zs,t-,) + E{S['\_,) ^ is, (13) 

5=0 



dS 

where V^^^ is the first component of Vt. Hence, 

rs = Us'[is-iA + A^)Ts]. 

The quantity will reveal how outliers in the dependent and independent variables may 
combine to affect oii'^^ 

In the above notation, we can similarly derive an influence function of Oif^^ = Us]uls,wi 
where 

ls,w = - ^ S\^^ws,t-iZs,t-i and Us,w = -'^ws,t-iZs,t-iZs,t-i 
^ t=i ^ t=\ 

with 

^s,t = h((e^^)^^5,.^j,e^))]-^ 

Here note that ws,t-i = o"r^(^«'^'*)- Since ^s,w and Us,w are the respective sample versions 
of ' 

ls,w = E[sl^^ws,t-iZs,t-i\ and Us,w = E[ws,t-\Zs,t-iZl^t_-^], 

with 

^s,t = [w{{T'sfZs,tZl,T's)]-\ 
the functional analogue of Ts is 

Ts,w = ^S,i^S,w: 

Write 

Qt^VtZl, and L^,* = ^2 (r^)^g,T^. 
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Then by analogy with (12), 



5=0 

where 

= E{ws,t-iVt-iZst_i) and = E{Ly,^t-iZs,t-iZs,t-i) 

and with (13), 



d 
d5 



5=0 



= E{v}'^ws,t-iZs,t-i) + E{si'^ws,t-iVt-i) 
-E{L^,t-iZs,t-i) - E{L^ t_iZs,t-i) 



= is,w 



Hence 



This expression wiU facihtate the fundamental description of sensitiveness or insensitiveness 
4. Simulations 

A finite sample experiment is performed to assess the asymptotic efficiency of the EF esti- 
mator given by (8) relative to LS, ML and QML estimators for a small and a large sample 
of observations. 

To facilitate meaningful comparisons, we generate an ARCH(l) of length n = 50 or 
n = 500 for values of = (^O)'^oi) = (1)0.1) or (1,0.3) based on 5000 replications. With- 
out loss of efficiency, we assume ujq = 1 and estimate ctoi using LS, ML, QML and EF 
methods. For this purpose, we consider two error distributions— a normal distribution and 
a Student— distribution with v — 5 degrees of freedom. In both cases, the unconditional 
mean and variance arc and 1, respectively. 

Tables 1 and 2 report the results in terms of bias, variance and mean square error (MSE) 
for ai^^\ ai!^^\ a^^^^ and aif^^ of ctoi. A closer examination of the MSE values in the 
tables reveals some interesting features. We first note that the values are intrinsically stable 
with respect to the choice of parameters and sample sizes. In every case, we observe that 
the finite-sample MSE of the EF estimator is desirable relative to other alternatives such as 
LS, ML and QML methods. The efficiency of an increases against its counterpart as the 
sample size increases or ctoi decreases. In the case of normality, the MSE results for the ML, 
QML and EF estimators are approximately the same. To this end, note that the result of 
Tables 1 and 2 also holds true for normahzed error distributions such as double exponential, 
logistic and gamma having zero mean and unit variance. 

Our simulation results highlight the benefits of using the EF formulation for modelling 
data drawn from non-normal conditional distributions. This approach naturally takes ad- 
vantage of departures from normality to improve the efficiency of estimators given a finite 
sample of data. In comparison with asymptotically based methods, the focus on the finite 
sample in the EF formulation is important. Wc observe that efficiency gains from the EF ap- 
proach are substantial. Hence, the EF formulation is potentially useful in cases with serious 
departures from normality in which efficiency is important. 
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Table 1: MSE of the LS, ML, QML and EF estimators for aoi 
normal errors 



0.1,0.3 with standard 



Parameter 










n 

50 500 


n 

50 500 


n 

50 500 


n 

50 500 


CKoi = 0.1 


0.1195 0.0098 
0.7150 0.0953 
0.8345 0.1051 


0.0120 0.0013 
0.5668 0.0524 
0.5788 0.0537 


0.0120 0.0013 
0.5668 0.0524 
0.5788 0.0537 


0.0062 0.0012 
0.5670 0.0524 
0.5732 0.0536 


CKoi = 0.3 


0.1037 0.0978 
0.7554 0.1899 
0.8591 0.2877 


0.0867 0.0420 
0.6156 0.1324 
0.7023 0.1744 


0.0867 0.0420 
0.6156 0.1324 
0.7023 0.1744 


0.0850 0.0405 
0.5702 0.1320 
0.6552 0.1725 



Note: The three values in each cell are from top to 



jottom: squared bias, variance, and MSE. 



Table 2: MSE of the LS, ML, QML and EF estimators for aoi = 0.1, 0.3 with distributed 
errors 



Parameter 






.(QML) 
Lin 


Oin 


n 

50 500 


n 

50 500 


n 

50 500 


n 

50 500 


CKOl = 0.1 


0.2314 0.0124 
0.7808 0.1090 
1.0123 0.1214 


0.0203 0.0088 
0.6010 0.0552 
0.6213 0.0640 


0.0111 0.0050 
0.5797 0.0537 
0.5908 0.0587 


0.0078 0.0013 
0.5633 0.0556 
0.5711 0.0569 


ctoi = 0.3 


0.2132 0.1096 
0.9875 0.1711 
1.2007 0.2807 


0.1710 0.1013 
0.6575 0.1579 
0.8285 0.2592 


0.0987 0.0268 
0.6432 0.1410 
0.7419 0.1678 


0.0836 0.0277 
0.6482 0.1377 
0.7318 0.1654 



Note: The three values in each cell are from top to bottom: squared bias, variance, and MSE 
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5. Proofs 



In this section we provide the proofs of Lemma 1, and Theorems 1 and 2. 
Proof of Lemma 1. Note from (3) that 



n 

t=i / \ t=i 



Since {Xt} is stationary and ergodic, so is {Yt-iY^[_i) which is finite by Assumption 1. Thus 
by the ergodic theorem 



n 

t=i 



Next consider the second factor on the right side of (14). Let c = (cq, . . . ,Cp)'^ be any 
vector with c ^ 0. RecaU that rjt = {e^ — 1)(t|(6'o). Then using the martingale central limit 
theorem and Cramer- Wold device, it follows that 

^-1/2 ^ ^Ty^_^^^ ^ ^(0, Var(e',)Jn(eo)c). 

t=i 

To prove this, note that Yt-ir]t is a martingale difference sequence since E{Yt-ir]t\J^t-i) — 0. 
We now verify the conditional Linderberg condition only since the other conditions can be 
verified easily. 

For all e > 0, we show that 

Ln = -j2('^K0o)c^Y,-,r 

= 0,(1), (15) 
where /(fi) is the indicator function of the event Q. Observe that 

^ t=l 

X E{{e', - ini{\e', - 1| > n'/'e) + /(|<7,^(^o)c^i^t-i| > n'/'e)]\J^t-i} 
^ t=i 

n 

+ Var{sl)-y2{a^{9o)c'Y,_,yi{\a',{eo)c'Y,_,\ > n'"e) 

= Ti + Var{el)T2. 
Noting that Var{ef) < oo and that 

^ j2i^K0oyyt-ir = E[{a^{eo)jY,_,r] + o,(i). 



n 
t=i 
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we obtain Ti = Op(l). Moreover, note from Assumption 1 that 

E{n) = E[{a^{9o)c'Yt^ifl{\a^{9o)c'Yt^i\ > n'/'e)] = o(l), 

which imphes T2 > 0. Hence (15) is satisfied and by Slutsky's theorem the assertion of 
Lemma 1 follows. 

Proof of Theorem 1. From (8), observe that 

The result in the theorem can be proved if we show that 

n 

and 

n 

- (tt-i>;^i^o)-^} = 0,(1). (17) 

t=i 

In view of Lemma 1, we note that Oi^^^ Oq for sufficiently large n, and thus, a1{9^n^^) 
behaves like (jjiOo) for each t— 1, . . . , n. This statement holds true, if for e > 0, there exists 
such that \\9^^^ ~ ^o|| < e for all n > A^g, with probability one. Consequently, we have 

for which, the proof is reduced to that of Lemma 1. Since the proof of (16) and (17) is 
similar, we prove (16) only as follows. 

By a Taylor expansion around O'^^^ at it follows that (16) is dominated by 

I 1 " 

I 77" 

where lies between Oq and 6n . Moreover, from the ergodic theorem, we readily see that 

1 . 

t=l 

Hence the conception of Theorem 1 follows. 

Proof of Theorem 2. The proof of this theorem requires the following matrix inequality 
(see Kholevo (1969)). 

Lemma 3. Let A{u) and B{u) be r x s and t x s random matrices, respectively, and 
iplu) is a function that is positive everywhere. If E{BB^ /ip) exists, then 

E^AA^il^) > E{AB^)E{BB^/ilj)-^E[{AB^)f. 
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The equality holds if and only if there exists a constant rxt matrices C such thatijjA+CB = 
almost everywhere. 

Using the notation of Lemma 3, write A = B = Yt_i and i/) = af{6o). Then it follows 
that TZ{9q) > UT~^{dQ)U. It is obvious from Lemma 3 that the equality TZ{9q) = IAT^^{9q)U 
holds if and only if cr^ (^o) = k, a constant almost everywhere. Hence we get the desired result. 
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